Tags
Created by
Marc Leprince
Created time
Aug 9, 2023 3:24 PM
Last edited time
Aug 16, 2023 8:22 PM
Medium Post
Published Date
- Are Software companies worth it? - from Jamin Ball’s Clouded Judgement Blog
- Very interesting read from a financial view of tech companies, what they are worth and the multiples they deserve based on stock prices, etc…
- Essentially - hitting a desirable cashflow generation target is very hard to do, and because many tech companies won’t - there’s reason to believe many of them are over valued.
- Strategies & Techniques for Debugging SAS program Errors & Warnings - 2013 WUSS White paper
- covers a good framework for categorizing errors - compilation errors (syntax, semantic (misspellings), macro) or runtime (execution time (data not sorted…), data, or macro)
- Also contains useful datastep options for debugging - dsnferr: stop processing when data does not exist
- Data Architecture Evolution - Blog from Diogo Santos
- Data Warehouse as first basic architecture - SQL is the foundation: traditional source, etl to target with change data capture processes, then serve up in reports/visualizations
- Pain points: ETL, jobs & tables spread quickly into disarray without strict controls, and doesn’t allow for big data or unstructured. On/offboarding sources is a pain. Schema on write
- Data Lakes were created to help data scientists with model training processes
- need access to large variety & volume of data
- often use HDFS & other parallel procesing fameworks
- Uses ELT (schema on read)
- After Transform - it feeds DW or feature stores
- Can contain “layers” of cleansed data, raw, bronze, silver, gold…
- also offloads the T in ELT to data science teams away from an IT resp
- But data lake can quickly turn into a swamp if it isn’t cleaned and maintained properly
- tough to track sources of datasets…
- Cloud data lake
- RT structures like Kafka emerge & cloud tech makes integrating tech easier & cheaper
- to offer all this functionality, there is a lot of complexity behind the scenes, & can take a long time to produce sharable data
- Data Mesh Architecture
- Newest approach that follows uses microservices solution to monoliths applied to data architecture
- data is decentralized & owners of data are spread across domains that are resp for modeling, storage, governance & architecture…
- Domains - “microservice” of people that are reponsible for data products, infra, govn, and Mesh API
- It becomes a mesh when all these data components feed others and it is all trackable
- Linux Tips from Paul Brown
- Excellent and useful tips and explanations of how the . [] {} <> and other sorts of characters are used throughout bash commands and how to read and use them.
- BCBS Progress report (2019) - Deloitte Commentary on BCBS BIS report for GSIB banks
- Focuses on governance and infrastructure, risk & data aggregation capabilities, and risk reporting practices —> it’s clear regulators standards have increased for quality & availability of data
- Governance - establishing clear ownership and accountability for data with indp units for validating data quality and data management (CDO setting data quality standards)
- Data architecture & IT Infra - lack of compliance and need to invest resources in talent and tools to support the requirements
- Accuracy & integrity, timeliness, adaptability of data are all things that are dependent on data architecture & IT infra
- banks should develop a comprehensive strategy, with gap analysis and roadmaps to remediation with clear issues they are trying to resolve
- replace EUCs with more integrated reporting platforms
- upgrade data dictionaries
- Need to be able to understand the transformations that occur from datasource to a report in production, and effectiveness of business process
- refine CDE - identify CDEs that are key to the org and at 2 levels (corp & business), can be used to prioritize assignment of controls and review
- data lineage with clear ownership model throughout data lifecycle